Multi-Aspect Tagging for Collaborative Structuring

نویسندگان

  • Katharina Morik
  • Michael Wurst
چکیده

Local tag structures have become frequent through Web 2.0: Users "tag" their data without specifying the underlying semantics. Every user annotates items in an individual way using the own labels. Even if two users happen to use a tag with the same name, it need not mean the same. Moreover, within the collection of a single user, media items are tagged multiply using different aspects, e.g., topic, genre, occasion, mood. Again, several users applying the same name for an aspect does not imply that actually the same aspect is meant. Nevertheless, users could benefit from the tagging work of others (folksonomies). The set of items clustered together by the same label in one user’s collection form a pattern. Knowing this pattern is informative for another user. In contrast to other cluster ensemble methods or distributed clustering, a global model (consensus) is not the aim. Each user wants to keep the tags already annotated, wants to keep the diverse aspects under which the items were organized, and only wishes to enhance the own structure by those of others. A clustering algorithm which structures items has to take into account the local, multi-aspect nature of the task structures. The LACE algorithm [9] is such a clustering algorithm. 1 Local Patterns in Parallel Universes Large collections of documents, music, and videos are stored today at personal computers. The collections can be structured by a general scheme as is done, e.g., by iTunes, where artist, album, genre, date annotate the music collection. The semantic web has focused on structuring text collections using general ontologies. Users additionally structure their collections concerning several aspects of personal concern. The broad general schemes are enhanced by personal, more specific aspects, e.g., mood, time of day for structuring music, or more finely grained topic structures for text collections. These enhancements are local, i.e. they are the personal view of a certain user who does not aim at a global structure for all users. We might call the sets of (media) items which are clustered together by a user (i.e. they are tagged by the same label), a local pattern. All these patterns of a user are structured by aspects to form this user’s universe. Since many users build-up their universes in parallel, there exist many parallel universes. Dagstuhl Seminar Proceedings 07181 Parallel Universes and Local Patterns http://drops.dagstuhl.de/opus/volltexte/2007/1263 While users tend to start the organization of their personal collection eagerly, they often end up with a large set of items which are not yet annotated and a structure which is too coarse. Facing this situation, we ask, how machine learning techniques could help the users. If there are enough annotated items, classification learning on one user’s collection can help. It delivers a decision function φ which maps items x of the domain X to a class g in a set of classes G. New items will be classified as soon as they come in and the user has no burden of annotation any more. However, classification does not refine the structure. Classification: The input is I = {φ : S → G}, where S ⊆ X represents the training examples and φ their mapping to the set of predefined classes G. The tasks is to output exactly one function O = {φ : X → G} that is able to assign all possible objects to exactly one class in G. The classification setting can be extended to hierarchical classification with I = {φ : S → 2} and O = {φ : X → 2}. If the classification is trained on the local patterns of one user, it delivers a local model for each local pattern. Exploiting other users’ local models could be performed by classifier ensembles. In this case, all the parallel universes together classify new items in a consensus model. For our application, this has several disadvantages. First, the consensus model destroys the specific, individual structure of a user’s collection, in the long run. Second, the structure is not refined, because the classes are predefined. Classifier Ensembles: The input is now a set of mappings I ⊆ {φ|φ : S → G}, but the output is still a single function O = {φ : X → G}. Again, the setting can be extended to hierarchical classifier ensembles with I ⊆ {φ|φ : S → 2} and O = {φ : X → 2}. If there is no structure given yet, clustering is the method to choose. It creates a structure of groups G for the not yet annotated items S ⊆ X. However, it does not take into account the structure which the user already has built up. Semi-supervised clustering obeys given groupings [3]. Note, that clustering does not predict previously unseen items as does classification. Hence, the domain of the function φ is not X but S. In supervised clustering, the input function constrains the output function, where both are defined on the same domain. However, supervised clustering does not refine structures. Supervised Clustering: In contrast to traditional clustering, a mapping for some objects x ∈ S ⊆ X to their clusters g ∈ G is input, I = {φ : S → G}. The output is O = {φ : S → G} or for the hierarchical case O = {φ : S → 2}. Supervised clustering may deliver a local model for a user’s local patterns as well with as without any support from other users. We may consider the structuring achieved so far a set of partitionings φi, each mapping a subset of the given items to a set of groups Gi. For instance,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paper-Centric Interaction Concepts for Collaborative Learning

Field studies show that in many learning settings paper has intrinsic advantages over electronic documents. In this paper we present concepts for the collaborative annotation and structuring of paper documents and digital documents in both distributed and co-located settings. The CoScribe prototype supports the annotation of printed lecture slides and collaborative sharing of annotations. Digit...

متن کامل

SemKey: A Semantic Collaborative Tagging System

By analysing the current structure and the usage patterns of collaborative tagging systems, we can find out many important aspects which still need to be improved. Problems related to synonymy, polysemy, different lexical forms, mispelling errors or alternate spellings, different levels of precision and different kinds of tag-to-resource association cause inconsistencies and reduce the efficien...

متن کامل

Evaluating the Benefits of Social Annotation for Collaborative Search

In this paper we present the results of a user study on the usage of social annotation features – sharing, rating, commenting and tagging – in collaborative search processes. Making use of our resource sharing system, LearnWeb2.0, our participants collaboratively searched for resources on a specific theme. The findings show that there is an imbalance between what users share and what they searc...

متن کامل

Sharing vocabularies: tag usage in CiteULike

CiteULike is a collaborative tagging web site which lets users enter academic references into a database and describe these references using tags (categorizations of their own choosing). We looked at the tagging behavior of people who were describing four frequently entered references. We found that while people tend to agree on a few select tags, people also tend to use many variants of these ...

متن کامل

Recommending in Social Tagging Systems based on Kernelized Multiway Analysis

Along with the new opportunities introduced by Web 2.0 and collaborative tagging systems, several challenges have to be addressed too, notably the problem of information overload. Recommender systems are among the most successful approaches for increasing the level of relevant content over the “noise”. Traditional recommender systems fail to address the requirements presented in collaborative t...

متن کامل

Consensus Dynamics in a non-deterministic Naming Game with Shared Memory

In the naming game, individuals or agents exchange pairwise local information in order to communicate about objects in their common environment. The goal of the game is to reach a consensus about naming these objects. Originally used to investigate language formation and self-organizing vocabularies, we extend the classical naming game with a globally shared memory accessible by all agents. Thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007